Enhancing biological information systems with granularity
نویسنده
چکیده
ion is related to granularity, which is the process to go from a detailed to simplified representation – i.e. going from fine to coarse-grained levels of granularity. Without pre-defined levels, application support for abstraction is limited to manual procedures, syntax, and heuristics [4]. 3. Proposed solution 3.1 The vision The aim of this research is to develop a formal, domainand implementation-independent theory of granularity that can, and will, be used for computational reasoning in different subject domains. This theory is orthogonally positioned, added to, the data sources (dataand knowledge bases and ontologies) to enhance data and information management and inferencing across levels of granularity. The theory and prototype implementation will be sufficiently comprehensive to be useful in the subject domain of biology. The overview of the basic architecture is depicted in Fig 1. The source data on the left-hand side provides the content that is loaded into the domain granularity framework, or the data source receives an additional layer of logic to enhance querying and inferencing. The meta-granularity is on the right-hand side, where it is specified what the granularity components are, how adjacent levels relate to each other, and how the components – level, perspective, domain – are related. The definitions and constraints at this meta-level restrict the specification of each domain granularity framework (in the centre of the figure) to ensure consistency of the domain perspectives and levels with the theory. These three major components together will result in a robust characterisation and implementation of granularity. Figure 1. Main components of, and related with, granularity that are analysed in this research. D: domain, GP: granular perspective, GPGL: granular level of a perspective, RL: relation between two adjacent levels, di: subject domain, gpi: perspective in the subject domain, gp1gl1... gpngln: levels defined for each perspective, rli: relation between two levels in the domain granularity framework. 3.2 Achievements and planned research The main theoretical contributions I have made to date to realise this vision, are the disambiguation of types of granularity into a taxonomy and data structure internal to granular levels [6], ontological considerations, a FOL characterisation of the domain independent theory of granularity and its data manipulation operators, including abstraction [4]. On the experimental side, informal [5] and formal perspectives, levels, and data manipulation operators for the infections diseases domain were defined and tested using DL [8]. Granular information retrieval was tested with the Foundational Model of Anatomy and the Gene Ontology and its limitations concerning representation and implementation is discussed [7]. Planned research involves: 1) refining the FOL characterisation of the theory and providing a more comprehensive justification of the modelling considerations and decisions (indistinguishability, and formal aggregation/part-of, 2) improve feasibility for data manipulations (e.g. recursive queries), ontology linking, 3) test cases with infectious diseases and nuclear hormone receptors, and 4) practicability with DL, OWL, RDF and SPARQL. 4. Comparison proposed solution to existing approaches Summarizing the three groups of shortcomings in existing approaches, described in §2, to address the problems briefly outlined in §1, which the solution proposed in §3 shall solve: A. The informal approaches are not usable for computation and reasoning, are (philosophically) inconsistent, and underspecified. Solution: Development of a domain-independent theory of granularity, comprising disambiguation between scaleand non-scale-dependent types of granularity and ontologically motivated modelling decisions necessary for the development of a generic, ontologically sound formal foundation of granularity, which enables automated reasoning and precise specification of domain granularity. B. The formal approaches are more or less compatible partial solutions to solving what granularity is and how to use it, are not context-aware, and do neither have formalized nor tested what one can do with it, therefore they are only of limited use. Solution: Foundational semantics, formal representations, and operators are related through a unifying theory that maps to a model-theoretic semantics usable to specify and constrain domain granularity frameworks, which is necessary for computation and interoperability. The data manipulators enable querying and reasoning to manage the databases, ontologies-stored-in-databases, knowledge bases, and (OWL-)ontologies to which granularity is applied, providing a novel method to analyse information as opposed to only describe it. C. The engineering solutions are not reusable in the current format beyond the software application each one is designed for, with the familiar problems in interoperability. Solution: The domain and implementation independent theory of granularity ensures its genericity and widest possible applicability such that multiple divergent uses have a shared common well-founded framework. For instance, the research will be useful in (biological) database management, cross-granularity querying and reasoning, ontology and pathway management with ‘browsing in context’, fact finding, and provide support for other computational implementations like information retrieval from large corpora. 5. Benefits of the research The combination of maximum expressivity and computability takes advantage of both foundational ontology aspects and engineering usefulness for implementations. The domainand implementation-independence and formal rigour will be more comprehensive than abstraction and granularity separately achieve and can be applicable to a wider variety of software applications, thereby increasing usability and reusability. There is limited, ad hoc, ‘context aware’ ontology browsing and agent mediation that is difficult to scale-up, but will be much easier with the proposed theory. It can make the huge amounts of data and large ontologies in the LSSW manageable and understandable because the user will be able to ‘zoom in’ to the desired section & level(s), hiding irrelevant information, yet at the back-end this is still linked and usable for inferencing and for other users with different foci. The additional logic layer integrated with Semantic Web technologies enables data manipulation through recombination of the information to retrieve more information from the same data source, where discovering gaps and errors in the data source can be easily found and corrected or provide an impetus for wet-lab research. 6. Research methodology Implicitly, the aims in §3.1 contain the hypothesis that there exist a ‘granular view’ on reality – or, more strongly: that reality is granular – that can be represented in one encompassing theory. Consequently, it may be falsified if a) the theory of granularity requires exceptions for each subject domain – suggesting that there is no underlying framework after all, or the theory is insufficient – and b) by the inadequacy of applicability to any subject domain – indicating that the theory is not properly grounded. To test the hypothesis, I do not take a pure top-down or bottom-up approach, but follow one that is mainly top-down with several bottom-up examples taken primarily from the biology subject domain in order to provide intermediate feedback during definition of the theory and applicability of the theory (see also achievements described in §3.2). First, a preliminary investigation comprised a literature and application search, including a possible solution extending contextual reasoning, and experimenting with domains like infectious diseases and some examples in ecology. This aided understanding of perspectives, their criteria, levels, and formulation of research questions. Second, the coarse-grained formulation of research questions (sub-questions and tasks are omitted due to space limitations) that will be answered through iterations between developing a formal theory and experimentation with biology case studies: 1. Can a domain-independent theory of granularity be defined and formalised? 2. How can this theory of granularity constrain domain specific granularity? 3. How to load/apply domain data in/to the domain specific granularity framework? 4. What reasoning tasks can benefit from granularity? Where and how will this affect identified types of tasks and development of a model theory?
منابع مشابه
INTERVAL ANALYSIS-BASED HYPERBOX GRANULAR COMPUTING CLASSIFICATION ALGORITHMS
Representation of a granule, relation and operation between two granules are mainly researched in granular computing. Hyperbox granular computing classification algorithms (HBGrC) are proposed based on interval analysis. Firstly, a granule is represented as the hyperbox which is the Cartesian product of $N$ intervals for classification in the $N$-dimensional space. Secondly, the relation betwee...
متن کاملA Reference Model for Biomedical Ontology Evaluation: The Perspective of Granularity
There have been many attempts using ontologies to develop systems that integrate data from the domains of medicine and biology, across levels of gravity. Such integration systems have not gained wide adoption and reuse. This is largely due to a lack of an approach with metrics as frame of reference to enable users evaluate these systems for their representation of biomedical structure, accross ...
متن کاملSYSTEM MODELING WITH FUZZY MODELS: FUNDAMENTAL DEVELOPMENTS AND PERSPECTIVES
In this study, we offer a general view at the area of fuzzy modeling and fuzzymodels, identify the visible development phases and elaborate on a new and promisingdirections of system modeling by introducing a concept of granular models. Granularmodels, especially granular fuzzy models constitute an important generalization of existingfuzzy models and, in contrast to the existing models, generat...
متن کاملArtificial Intelligence Tools in Health Information Management
Application of ICT in health (eHealth) has become an integral part of modern healthcare systems. Electronic health information management has proven useful in improving quality of health care, reducing costs and facilitating health research. However, the increasing complexity of healthcare and the growing demand for high quality healthcare delivery has created a need for eHealth systems with t...
متن کاملA Data-Driven Design for Deriving Usability Metrics
The complexity of Enterprise Information Systems can be overwhelming to users, yet they are an often overlooked domain for usability research. To better understand the ways in which users interact with these systems, we have designed an infrastructure for input logging that is built upon a data model relating system components, user inputs, and tasks. This infrastructure is aware of user repres...
متن کاملResearch Paper: The Granularity of Medical Narratives and Its Effect on the Speed and Completeness of Information Retrieval
OBJECTIVE Using electronic rather than paper-based record systems improves clinicians' information retrieval from patient narratives. However, few studies address how data should be organized for this purpose. Information retrieval from clinical narratives containing free text involves two steps: searching for a labeled segment and reading its content. The authors hypothesized that physicians c...
متن کامل